Japanese Morphological Analyzer using Word Co-occurence -JTAG

نویسندگان

  • Takeshi Fuchi
  • Shinichiro Takagi
چکیده

We developed a Japanese morphological analyzer that uses the co-occurrence of words to select the correct sequence of words in an unsegmented Japanese sentence. The co-occurrence information can be obtained from cases where the system incorrectly analyzes sentences. As the amount of information increases, the accuracy of the system increases with a small risk of degradation. Experimental results show that the proposed system assigns the correct phonological representations to unsegmented Japanese sentences more precisely than do other popular systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Morphological Analyzer for Japanese Nouns, Verbs and Adjectives

We present an open source morphological analyzer for Japanese nouns, verbs and adjectives. The system builds upon the morphological analyzing capabilities of MeCab [Matsumoto et al., 1999] to incorporate finer details of classification such as politeness, tense, mood and voice attributes. We implemented our analyzer in the form of a finite state transducer using the open source finite state com...

متن کامل

The Mega-Word Tagged-Corpus Project

Large corpora with part-of-speech tagging play a very important role in recent statisticsbased and example-based natural language processing systems. However, no such corpora have become widely available for Japanese so far. Because the Japanese language has no explicit word boundaries, it is impossible even to count words without a corpus that has at. least word segmentations. This paper descr...

متن کامل

HMM Parameter Learning for Japanese Morphological Analyzer

This paper presents a method to apply Hidden Markov Model (HMM) to parameter learning for Japanese morphological analyzer. We especially emphasize how the following two information sources affect the results of the parameter learning: 1) The initial value of parameters, i.e., the initial probabilities and 2) some grammatical constraints that hold in Japanese sentences independently of any domai...

متن کامل

Mostly-Unsupervised Statistical Segmentation of Japanese: Applications to Kanji

Given the lack of word delimiters in written Japanese, word segmentation is generally considered a crucial first step in processing Japanese texts. Typical Japanese segmentation algorithms rely either on a lexicon and grammar or on pre-segmented data. In contrast, we introduce a novel statistical method utilizing unsegmented training data, with performance on kanji sequences comparable to and s...

متن کامل

Example-Based Correction of Word Segmentation and Part of Speech Labelling

This paper describes an example-based correction component for Japanese word segmentation and part of speech labelling (AMED), and a way of combining it with a pre-existing rule-based Japanese morphological analyzer and a probabilistic part of speech tagger. Statistical algorithms rely on frequency of phenomena or events in corpora; however, low frequency events are often inadequately represent...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998